Document Image Classification Via AdaBoost and ECOC Strategies Based on SVM Learners

نویسندگان

  • Mehmet Ahat
  • Cagdas Ulas
  • Onur Agin
چکیده

In this paper, we describe easily extractable features and an approach for document image retrieval and classification at spatial level. The approach is based on the content of the image and utilizing visual similarity, it provides high speed classification of noisy text document images without optical character recognition (OCR). Our method involves a bag-of-visual words (BoVW) model on the designed descriptors and a RandomWindow (RW) technique to capture the structural relationships of the spatial layout. Using the features based on these information, we analyze different multiclass classification methods as well as ensemble classifiers method with Support Vector Machine (SVM) as a base learner. The results demonstrate that the proposed method for obtaining structural relations is competitive for noisy document image categorization.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

AdaBoost and Support Vector Machines for Unbalanced Data Sets

Boost is a kind of method for improving the accuracy of a given learning algorithm by combining multiple weak learners to “boost” into a strong learner. The gist of AdaBoost is based on the assumption that even though a weak learner cannot do good for all classifications, each of them is good at some subsets of the given data with certain bias, so that by assembling many weak learner together, ...

متن کامل

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

Empirical analysis of support vector machine ensemble classifiers

Ensemble classification – combining the results of a set of base learners – has received much attention in the machine learning community and has demonstrated promising capabilities in improving classification accuracy. Compared with neural network or decision tree ensembles, there is no comprehensive empirical research in support vector machine (SVM) ensembles. To fill this void, this paper an...

متن کامل

Learning Document Image Features With SqueezeNet Convolutional Neural Network

The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...

متن کامل

Traffic sign classification using error correcting techniques

Traffic sign classification is a challenging problem in Computer Vision due to the high variability of sign appearance in uncontrolled environments. Lack of visibility, illumination changes, and partial occlusions are just a few problems. In this paper, we introduce a classification technique for traffic signs recognition by means of Error Correcting Output Codes. Recently, new proposals of cod...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014